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® (54) Title: INTERNET BROWSER 

® (57) Abstract: The Internet (210) is searched (204-208 ) in nrder to find re.soiirce ?; (212: diat provide s treamahle audio such 
Q as livf InTfmfl hrr^^^^^^^ The r esources (212: 213) areadfiHtiiied (202).taSS$Lfla*e"' file extension and are qaiegouzfid (222) 
^ according to, e.g., the n iaTi F^l inngiiap^^ o r mncir ctyi^- The user is enabled to browse (230) the collection based on tfij^tualor gip^ical 
^ inpuL. 
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Internet browser. 



FIELD OF THE INVENTION 

The invention relates in particular to a method for categorizing web sites that 
provide audio broadcasts over the Internet 

5 BACKGROUND ART^ ' ' 

Currently /nearly 10,000 radio stations broadcast over the Internet.- 

stations stream their audio content. A streamed file is a file that can be started for playing out 

before the download is completed . With a proper network connection, decoding and playback 

software on one's PC or set top box the audio can be captured. Audio output hardware, e.g., 

10 analog sound cards, USB speakers, tod streaming media tools^ such as RealPlayer firom 

" RealNetworks, Inc , have becorhe widely available arid enable to add the radio fionctionality 

to one's PC. 

SUMMARY OF THE INVENTION 

1 5 The user has to check many listings of large numbers of web sites that provide 

these broadcasts over the Internet. Accordingly, there is a need for helping the user to select 
from among the huge number 6f stations available. To this end, a first aspect of the invention 
provides a method arid device for categdrizing web sites or resoiirces on the Intemet that 
provide audio (e.g., speech and music) streaming based on their typical content. Other 

20 aspects of the invention include a method of and a device for locating at least a specific one 
of multiple Intemet resources that provide streamable audio content, and a searchable 
database as defined in the independent claims. The dependent claims define advantageous 
embodiments. 

A web resource that provides audio streaming is identified by its resource 
25 type. The resource type is determined by way of the type extension in its URL that indicates 
the file format, e.g., ".ram", ".tsp." or ":swa". This extension enables, for example, to 
automatically open the proper sofhvare applications (or "plug-ins") in the user's browser 
when the hyperlirik is clicked. Accordiiigly ; the relevant resources on the Intemet can be 
identified based on their URL. If the file extension is not available through the URL, the 
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resource type is determined by the MIME type or content-type information provided in the 
HTTP header of the resource. Taking into consideration the resource's country domain 
extension, e.g., ".nl" for the Netherlands or ".ru" for Russia, further optimizes the analysis of 
the URL, for example if one is interested in audio content in a specific natural language. 
5 Upon finding a relevant resource, i.e., oiie that provides streaming of audio, the resource' s 
file is retrieved firom the relevant sej^ver and analyzed based on its audio content. In a 
preferred embodiment, the inventoTr proposes to use speech, recognition or music 
(tune/rhythm) recognition software to search through and categorize these stations by, e.g., 
language, style of music, absence of commercials. Speech recognition software is capable of 
1 0 determining the signature of various kinds of music, thus allowing categorization of music 

with just this kind of .software. For example., classical music has typically a different speech 
\ \ recognition signature than rock music, A server can, be. dedicated to categorize stations or 
: channels in a database, similar as tq what PlanetSe^ch or Altavista does for text documents. 
' One or more web crawlers, can be used in parallel to automatically fetch web sites that supply 
15 audio so as to identify them for a search engine. Ad(iitiona|ly, the resource's server can be 
evaluated by the crawler for the quality of the connection, e.g., connection speed, reliability, 
etc. For example, the categorizing server may reconunend to a user, who has. broadband 
network access (e.g., ISDB, cable, Tl), higher connection speed sources. An audio browser is 
provided, analogous to PlanetSearch's or Alta Vista's for text, to provide a searchable 
' 20 collection of Internet audio web sitps b^ed from which specific pages are returned to the 
user based on certain audio search criteria. Alternatively, the catalog approach (Yahoo 
' experts hand-pick and assign sites 0 categories) can be taken to categorize the stations at the 
server and make them accessible tiirough a search engine. Once the sites are categorized, a 
user provides a query input to the seiver and^receives a list of URLs representative of the 
:■ 25 channels that match the query input (e.g., give me a French language station that plays music 
like this). As an altematiye or suppprting this, the server provides a customized electronic 
. program guide to the user based on a profile of the user stored on the. server, e.g., using the 
SmartConnect infrastructure of Philips Electronics. 

, The invention is of conunercial interest in particular to, e.g., cable providers 
; 30 and network owners, and service providers in order, to serye as an incentive for subscribers. 

^ As to music rcqognition, see, for example U.S. Patent 5963957 (Attorney 
Docket PHA 23,2413, and herein incorporated by reference. This patent document discusses, 
; , among other, things, how rhythm in^formation or.tonal information of a musical theme can be 
used to identify the theme. The rhythm information comprises the time signature (meter) and 
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the accentuations of the theme. The time signature determines the number of beats to the 
measure. The accentuation cletermines which beat gets an accent and which one does not. 
For example, the sign ^ g in a musical score is the time signature indicating that the meter is 6 
beats to the measure and that an eighth note gets one beat. Flamenco music has a variety of 
5 different styles, each determined by its own compas (rhythmic accentuation pattem). Typical 
examples of flamehcb music are Alegrias,' Bulerias, Siguiriyas and Soleares that all have 12 
beats to the measure. In the Alegrias, Bulerias and Soleares, the thirds sixth, eighth, tenth and 
twelfth beats are accentuaited. The first, third, fifth, eighth and eleventh beats are emphasized 
in the Siguiriyas style. In this system rhythmic accentuation patterns are used as input data in 
1 0 order to retrieve bibliographic information associated with the theme that is represented by 
the rhythm! For example, the rhythmic accentuation pattem is entered into the system as a 
substantially monotonic sequence of accentuated and unaccentuated soimds. The input data 
then is represented by, e.g., a sequence of beats or peaks of varying height in the time 
domain. The relative distances between successive peaks represent the temporal aspects of 
15 ' the pattem and the relative heights repreisent the accentuatioiis in the pattem. The sequence of 
beats and rests in between is represented by a digital word. The words can be stored 
lexicographically to enable a fast and orderly retrieval. If tonal information and/or rhythm 
information can be used to identify individual musical themes, they can also be used to 
identify vvith more or less accuracy a certain style of music. 
20 As to SmartConnect, see, for e"kample the rion-prepublished PCT application 

WO-A-00/17789 (attorney docket PH A 23,500), herein incorporatedtby reference. This 
document relates to a server system that maintains a user profile of ^ particular end-user of 
consumer electronics network-enabled eqiiiprnent 'and a data base of new technical features 
for this type of equipment, e.g., a home network. If there is a match between the user-profile 
25 and a new technical feature, and the user has jndicates to prefer receiving information about 
updates or sales offers, the user gets notified;via the network of the option to obtain the 
feature. ' . ! , ' 

As to SmartConnect, also see the non-prepublished US patent application 
Serial No.09/1 89,535 (attorney docket PHA 23,527) filed 1 1/10/98 for Yevgeniy Shteyn for 
30 UPGRADING OF SYNERGETIC ASPECTS OF HOME NETWORKS, herein incorporated 
by reference. This document relates to a system with a servei-'thkt has access to an inventory 
of devices and capabilities on a user' s home network. The inventory is; for exanniple, a look- 
up service' as provided by HAVi, JINI and Home API architectureis: The server has also 
access to a data base with irifonhatibh of features for a network. The server determines if the 
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synergy of the apparatus present on the user's network can be enhanced based on the listing 
^ of the inventory and on the user's profile. If there are features that are relevant to the synergy, 
based on these criteria, the user gets notified. 

, 5 BRIEF DESCRIPTION OF THE DRAWINGS 

:The invention is explained by way of example and with reference to the 
. accompanying dra\yings, wherein: 

Fig. 1 is a flow diagram illustrating a method in the invention; 
, - Fig.2 is a block diagram of a system for use in the invention; and 
10 Fig.3 isabloGkdiagraniofpartoftiie;systempfFig.2. 

. Throughout the figures, same reference numerals indicate sirnilar or 
• .-^ . corresponding features. % ^ , . v . , , , . . . , r . 

PREFERRED EMBODIMENTS. , , . ^ , . 

15, . Fig. 1 is a flow .diagrarn 1 00 with the main steps in a method according to the 
invention..- - . - ^ - , . . . * : • 

= ^ In step 1 02, a first, or the next, ^yeb resource is identified based on its URL. 

The resource type of the current URL is determined in step 104 to find out in step 106 if the 
resource has an audio streaming format. For example, in step 10.6 the URL is checked for the 
20 . presence of a file extension that indicates streamable audio. If the URL does not have such a 
• . . file extension,, the resource is opened and the resource type is extracted, e.g., content type 
^ . . information or MIME type informatiGn is extracted from the HTTP header of the resource. If 
the resource does have a resource type that is compatible with an audio streaming format 
resource is retrieved in step 108. If it does not.have a streaming format the process returns to 
25 - step 102 to get the next URL. In step 1 1 0, the audio content of the resource opened in step 
r 108 is analyzed based on its audio content. For example, the rhythm signature is used to 
determine the style of a musical theme, or the language of an oral presentation is determined 
through speech recognition in step 1 12 in order to assign the resource to a specific category 
in step 1 14. A web site thus identified is preferably visited a number of times, in order to get a 
30 statistically relevant average profile for a more accurate indexing under a specific category or 
. ; . for automatically determining a category by clustering resources with a similar profile. This 
imay especially, be relevant to sites.that pr9vide live radio broadcasts. The so-called web- 
crawler or spideir technology can be; used for scanning the relevant sites and feeding them to a 
, . dedicated search engine that performs , the content analysis. 
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Fig.2 illustrates this in more detail with reference to a block diagram of a 
system 200. System 200 comprises a dedicated server 202 that sends out multiple, spiders 
204, 206, 208 over the Internet 210 to visit Web sites. As knovm, a spider,;also referred to 
as a Aweb crawler', is a software program that fetches Web pages and analyzes their content 
5 in order to generate searchable, indexed catalogs for a siearch engine. Web sites and specific 
pages can be visited and indexed selectively. A typical Web page includes one or more 
hyperlinks to other Web pages. Therefore, a spider can start almost from anywhere and hop 
* from Web page to Web page following the links it encounters while being out there on the 
Internet. Each of spideirs 204-208 visits HTML pages and scans it for clickable links that 
lO indicate the presence of resources for streamable audio. In this sense, an audio spider, i.e., the 
' entity that specifically looks for audio links, may ride piggyback vvith a conventional spider 
or crawler that scans text-based information. Currently, popular formats for streaming audio 
' include RealAudio (file extension '\ram") from RealNetworks/lnc., TriieSpee 

extension ".tsp") fi-om DSP Group, Inc:!; and Macromedia's Shockwave for Director (file 
15 extension ".swa"). Links that Have these extensions.are relevant to spiders 204-208. 

Once a spider, e.g., spider 206, has identified a resource 212 (based on its 
hyperlink 214) that provides aiidid streaming, it fetches the data and causes the content to be 
analyzed, e.g., by server 202, based on the content's pattern. The content's pattern is 
analyzed using, for example, automated speech recognition methods 216, or automated music 
20 pattern or rhythmic pattern analysis 21 8 as dispussed above. Based on the results of this 
analysis, the content is indexed by an indexer 220 in a data base 222 as relating to a certain 
natural language, one or more music styles. Alternatively, or subsidiarily, humto experts 224 
listen to the content associated'by the links thus identified by the spiders and categorize the 
audio by hand in data base 222. As a side remark, note that the number of audio sites on the 
25 Internet is large but hot as nearly as large as the number of textual and/or graphics sites by 
' many orders of magnitude. Accordingly, it pays off to have the audio links scanned or data 
base 222 reviewed by himiah experts. Server 202 provides a search engine 226 to search data 
base 222 for specific concepts requested by a user via his/her client 228 that has a browser 
230. For example, the user requests audio sites that supply a live radio broadcast in the 
30 Spanish language. The user submits his/her request that has the terms "live" and "Spanish" 
(or the Spanish equivalent thereof) in it. Sites that provide Spanish spoken programs are 
recognized by'their language, e.g., via speech recognition 216. Sites that provide a live 
broadcast are recognized as such by experts 224, or may be automatically identified by 
listening in oh the repeated mentioning throughoiut the day and ^t ciertain regular time 
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intervals of the date and time, as is the case with most news services. Alternatively or 
. • subsidiarily, the meta tags associated with the HTML pages containing the audio hyperlinks 
; or associated with; the audio pages themselves contain the expression "live" (or the equivalent 
; .in: another language). The cross section of the set of audio sites that provide "live" broadcasts 
5^ with the set of audio sites that provide audio comprising Spanish language is the set wherein 
the requester is interested. Similarly^ if the user is interested in audio streaming of a certain 
piece of music, he or she may provide.the input in a format as disclosed in U.S. patent 
5963957 (PHA 23,241)„discussed above, to determine if there is a matching resource 
" ^available on the Internet., If the user is interested in sites that supply a certain style of music, 
10 he or she may submit a request to search engine 226 in a textual format, such as: "get me the 
: < sites that provide French chansons from the fifties". Jhe, relevant terms here are: "French", 
; , "chansons" and ' -fifties" >based on which data base 222 is queried. Whether or not a certain 
chanson is from the time period indicated could have been added as an entry to data base 222 
by experts 224. Alternatively, or as an additiojial support the music data base dismissed in 
15 U.S. patent 5963957 (PHA 23,241). is used to intervene.and to convert the user's request into 
a request for specific music, titles queried in the manner specified. 

... Fig.3 is a block diagram of a part 300 pf system 200. In this example data base 
222 is comprised of first, second and third portions 302, 304 and 306, respectively. Portion 
i . 302 comprises a musical themes database, wherein musical themes (sequence of notes, 
20 , rhythmic signature^ etc) are stored. Portion 304: stores bibliographic information items 

associated with musical themes in portion 302. Portion 306 stores a data base witii hyperlinks 
. . , associated \yith Internet resources^at provide audio. The user supplies to search engine 226 
a certain tune or beat pattern aMnput308., This input information is supplied to data base 
portion 302 to determine if there is a' match bet\yeen,the.musical information supplied by the 
25 V, user and one or more themes stored. in portipn 302. Upon finding one or more matches, the 
^ corresponding, bibliographic infonftation items are retrieved, optionally for display to the 
user. The bibliographic information items enable running a query in the audio documents 
indexed in data base portion 306. U|iion finding one or more matches, the search engine 
, returns to the user at an output 3 1 CI the corresponding hyperlinks. 
30 ' It should be noted that the above-mentioned embodiments illustrate rather than 

ilimit the invention^ and that those.skilled in the art will be able to dpsign.many alternative 
. . embodiments without . departing Jrbrfi the scope pf the appended claims. In the claims, any 
reference signs. placed between, parejitheses, shall not be.cpnstruejd as limiting the, claim. The 
word 'comprising' does not exclude'the, presence-of other elements or steps than those listed 
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in a claim. The invention can be implemented by means of hardware comprisinlg several 
distinct elements, and by means of a suitably programmed computer. In a device claim 
enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
5 dependent claims does not indicate that a combination of these measures cannot be used to 
advantage. 
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.CLAIMS: - .: 



1 . ' ' a method of categorizing resources (212; 213) on the Internet (210), which 
resources (212; 213) provide streamable audio content, the method comprising analyzing 

(1 10) the audio content per individual one of the resources to obtain an indexed collection of 
the resources. 

5 

2. The method of claim 1 , wherein the categorizing comprises: 

- identifying (106) a specific one of the resources; 

- analyzing the type of audio content according to one or more predetermined criteria; 

- indexing (112) the specific resource to one or more particular ones of multiple categories 
1 0 based on the analyzing. 

3. The method of claim 2, wherein the analyzing of the type comprises analyzing 
(106) an URL of the specific resource. 

1 5 4. The method of claim 2, wherein the analyzing of the type comprises analyzing 

an HTTP header of the specific resource. 

5. The method of claim 2, wherein at least one criterion is associated with a 
particular natural language of the content (216). 

20 

6. The method of claim 2, wherein at least one criterion is associated with a style 
of music of the content (2 1 8). 



7. A method of locating at least a specific one of multiple Internet resources 

25 (2 1 2; 2 1 3) that provide streamable audio content, the locating comprising: 

- supplying input information representative of the specific audio content to a search engine 
(226); 
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- requesting the search engine to query a data base (222), comprising an indexed collection of 
streamable audio documents associated with the Internet resources, based on the mput 
information supplied; 

- if the search engine finds one or more matching ones among the documents, receiving one 
5 or more hyperlinks (214) associated with the matching. 

8. The method ofclaim 7, wherein the supplying comprises providing textual 

information to the search engine. * , 

1 0 9. The method of claim 7, wherein the supplying comprises musical information 

to the search engine. 

10. The method of claim 9, wherein: 

-the database comiprises: h: ;/: " v _ , . 

15 - a first set (302) of respective reference data representing respective reference 

sequences of referericefnusicaLl components of respective ones of multiple musical themes; 

- a second set (304) of respective bibliographic information itenis 
corresponding with respective ones of the multiple musical themes;;and ^ 

- the search erigine is coupled to an input (308) for receiving the input information 
20 representative of an input sequence of input musical components; _ 

- the search engine is operative to identify one or more particvilar bibliographic information 
items upon finding a niatch between one or more reference data and the input information; 
and ' ' - • • ■ . . . ^ ■■ ■ ' . 

- the locating coniprises finding a match between the one or more particular bibliographic 
25 information items and one or more of the audio documents. 

11. A method of enabling to identify a specific one of multiple resources (212; 

213) on the Internet (210), which resources supply streamable audio, the enabling 
comprising: \ , 

30 - providing a searchable data base (222) comprising hyperlinks representative of an indexed 
collection of the resources; 

- providing a search engine (226) for querying the data base upon a user having submitted a 
query item; and 
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- providing at least one of the hyperlinks upon the search engine finding a match between the 
' query item and the data base. 

• 12; A searchable data base (222) comprising hyperlinks in an indexed collection 

5 oflntemet resources that provide streamable audio:. 

13. : , A device for categorizing resources (212; 213),on the Internet (210), which 
resources (212; 213) provide streamable audio content, the device comprising: 

- means for analizing (1 10) the audio content per individual one of the resources, to obtain an 
' 10 indexed'collection of the resources. . , \ , 

14. A device for locating at least a specific one of multiple Internet resources 
(212; 213) that provide streamable audio content, the device coniprising: 

- means for supplying input information representative of thp specific audio content to a 
15 search engine (226); ' . . . - : L : ^ 

- means for requesting the search engine to query a data base (222) comprising an indexed 
collection of streamable audio documents associated vwth the Internet resources, based on the 
input information supplied; 

- means for, if the search engine finds one or more matching ones among the documents, 
20 receiving one or more /hyperlinks (214) associated wiih the matching. 

^ 15 . : A device for enabling to identify a specific one of multiple resources (2 12; 

213) on the Internet (210), which resources supply streamable audio, the device comprising: 
' . . ^ means for providing a searchable data base (222) comprising hyperlinks representative of 
25 an indexed collection of the resources; 

- means for providing a search engine (226) for querying the data base upon a user having 
submitted a query iteni; and . , , ^ 

- means for providing at least one of the hyperlinks upon the search engine finding a match 
between the query item and the data base. 
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